accuracy 0
Deflation-Free Optimal Scoring
Sparse Optimal Scoring (SOS) reformulates linear discriminant analysis to enable feature selection through elastic net regularization, making it well-suited for high-dimensional settings where the number of features exceeds observations. Most existing SOS methods use deflation-based strategies that compute discriminant vectors sequentially, which can propagate errors and produce suboptimal solutions. We propose a novel approach that estimates all discriminant vectors simultaneously under an explicit global orthogonality constraint, which we call Deflation-Free Sparse Optimal Scoring (DFSOS). DFSOS combines Bregman iteration with orthogonality-constrained optimization, decomposing the problem into tractable subproblems for scoring vectors, discriminant vectors, and orthogonality enforcement. We establish convergence to stationary points of the augmented Lagrangian under mild conditions. Extensive experiments using synthetic data and real-world time series data demonstrate that DFSOS achieves classification accuracy comparable to or better than existing deflation-based methods. These results indicate that deflation-free approaches offer a robust and effective framework for sparse discriminant analysis in high-dimensional problems.
DETAIL: TaskDEmonsTrationAttributionfor InterpretableIn-contextLearning
Firstly, many existing attribution techniques require either computing the gradients [58] or multiple queries to the model [19], both of which are slow and computationally expensive. In contrast, ICL is often applied inreal-time to a large foundation model [12] that necessitates the attribution approaches for ICL to be fast and efficient.
5ddcfaad1cb72ce6f1a365e8f1ecf791-Supplemental-Conference.pdf
Additionally, we provide the calibration performance of various competitive approaches. Briefly, calibration quantifies how similar a model's confidence and its accuracy are [Osborne, 1991]). To measure it, we employ the recently proposed Adaptive ECE (AdaECE) [Mukhoti et al., 2020]. For all the methods, the AdaECE is computed after performing temperature scaling [Guoetal.,2017] Unfortunately, we could not manage to make their code work on C100 as the training procedure seemed to be unstable.